WDA068492 


Systems 

Optimization 

Laboratory 


1 


D D C 

fr»pf?mnE 


MAY  9 1979 


toitinj 
c 


15 


1 


f 


I This  document  has  been  ctpprovsZ 
I for  public  relcorjs  end  solo;  iia 
‘ di3tribution  is  unlimited. 


I 


Department  of  Operations  Research 
Stanford  University 
Stanford,  CA  94305 


DDC  FILE  COPY  ae AO 6 84 9 2 


d? 

I SYSTEMS  OPTIMIZATION  LABORATOrTZ^ 

DEPARTMENT  OF  OPERATIONS  RESEARCH 
Stanford  University 
Stanford,  California 
94305 


DDC 


r?fK)r?nnnf?rn^ 

|i 

MAY  9 1979 

LlL 

LblEnLlj  U lilijj 

c 


GLOBALLY  CONVERGENT  ALGORITHMS  FOR  CONVEX  PROGRAftlING 


v|\/ 

’ Eric  Rosenberg 


TECHNICAL  REPORT  SOL  79-1 
February  1979 


j 

i 

1 


Research  and  reproduction  of  this  repow^eye  partially  supported  by 
the  Department  of  Energy  Contract  DE-AS^^^Fnnn34!  the  Office  of  Naval 
Research  Contract  NOOOl 4-75-C-0?g7!~3Ti?^he"Tiati onal  Science  Foundation 
Grants  MCS76-81259  AOl  and  MCS76~-20019  API . 

Reproduction  in  whole  or  in  part  is  permitted  for  any  purposes  of  the 
United  States  Government.  This  document  has  been  approved  for  public 
release  and  sale;  its  distribution  is  unlimited. 


I 


1 


t 


( 

f 


i ■. 
t ; 


Many  of  these  algorithms  can  be  classified  as  primal  approximation 
methods.  These  methods  treat  the  given  problem,  hereafter  referred  to 
as  the  primal,  by  using  the  current  estimate  of  a primal  solution, 
possibly  together  with  other  information,  such  as  estimates  of  the  Lagrange 
multipliers,  to  form  a constrained  minimization  subproblem  which  in  some 
way  approximates  the  primal.  The  procedure  of  solving  a sequence  of 
such  approximating  subproblems,  and  perhaps  executing  other  tasks,  we 
call  recursive  substitution.  For  example,  with  x^^  as  the  current 
estimate  of  a primal  solution,  we  might  solve  the  quadratic  subproblem 
obtained  by  linearizing  each  constraint  and  the  objective  function  about 
Xj^  and  adding  to  the  objective  function  the  term  (x-x^)^  G^(x-x^), 
where  is  a positive  definite  matrix  that  approximates  the  Hessian 

of  the  Lagrangian  at  a Karush-Kuhn-Tucker  (K.K.T.)  pair  [5»6, 22,29]. 

Various  non-quadratic  subproblems  have  also  been  proposed  [15,16,24,27]* 

All  of  the  algorithms  proposed  in  these  references  are  pure 
recursive  substitution  schemes,  that  is,  schemes  which  set  equal 

to  a solution  of  the  approximating  subproblem  generated  from  x^^.  Other 
methods  require  additional  computation,  such  as  a line  search,  to 
generate  [7,19,20,50].  Furthermore,  the  methods  of  [15,16,24, 

27,29]  are  one-point  methods,  that  is,  methods  that  use  only  information 
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at  the  current  point.  Methods  that  use  quasi-Newton  updates  [5>6,22j 
are  not  one-point  methods,  since  the  Hessian  approximation  depends  on 
previous  estimates  of  a K.K.T.  pair. 

It  is  reasonable  to  expect  a recursive  substitution  scheme 
to  be  effective  if  each  subproblem  can  be  easily  solved.  Notice  that 
in  general  a trade-off  is  inevitable:  the  easier  a particular  type  of 
subproblem  is  to  solve,  the  less  it  tends  to  resemble  the  primal,  and 
consequently  the  more  subproblems  we  expect  to  have  to  solve.  The 
primal  itself  is  of  course  a perfect  approximation  and  presumably  is 
difficult  to  solve,  while  the  subproblem  formed  by  linearizing  the 
constraints  and  objective  function  can  be  easily  solved  by  linear  pro- 
gramming, but  may  be  a poor  approximation,  especially  if  the  functions 
defining  the  primal  are  highly  nonlinear.  In  particular,  approximating 
a geometric  program  by  a linear  program  [5]  can  be  disastrous,  and 
often  it  is  desirable  to  approximate  a geometric  program  by  another 
geometric  program,  whose  constraints  and  objective  function  will  in 
general  be  nonlinear  functions  [28].  In  this  case,  each  approximating 
geometric  program  has  the  advantage  of  a smaller  degree  of  difficulty 
than  the  given  problem.  There  is  therefore  a need  to  study  general 
approximating  subproblems. 

In  order  for  any  algorithm  to  be  used  with  confidence,  it  is 
necessary  to  determine  under  what  conditions,  if  any,  the  algorithm 
generates  a sequence  of  estimates  that  converges  to  a solution,  and, 
if  convergence  can  be  established,  it  is  important  to  determine  the 
rate  of  convergence.  Most  algorithms  popular  today,  and  in  particular 
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most  pure  recursive  substitution  schemes,  exhibit  local  convergence. 
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That  is,  for  any  starting  primal  solution  estimate  Xq  in  some 
neighborhood  of  a primal  stationary  point  z,  the  algorithm  generates 
a sequence  (x^^)  that  converges  to  z.  A local  convergence  proof 
generally  requires  strong  differentiability  assumptions  and  a good 
estimate  of  a vector  of  Lagrange  multipliers  at  z.  Rate  of  convergence 
results  are  necessarily  local  results,  and  in  fact  are  usually  established 
in  the  course  of  proving  local  convergence.  In  [25],  local  convergence 
and  rate  of  convergence  results  are  derived  for  methods  utilizing 
arbitrary,  possibly  non-quadratic,  approximating  subproblems  in  a one- 
point  recursive  substitution  scheme. 

Few  researchers  have  considered  the  question  of  global  con- 
vergence. We  will  say  that  a nonlinear  programming  algorithm  is  globally 
convergent  if,  for  any  arbitrary  starting  primal  solution  estimate  x^, 
the  algorithm  generates  a sequence  {x^}  that  converges  to  a primal 
stationary  point.  A globally  convergent  algorithm  is  extremely  desirable, 
for  a locally  convergent  method  might  fail  miserably  if  provided  with 
a poor  initial  estimate,  and  a feasible  direction  method  [55]  requires 
an  initial  feasible  point,  which  might  be  unavailable  or  difficult 
to  compute. 

Recently,  a globally  convergent  algorithm  employing  quadratic 
subproblems  has  been  proposed  [7].  Under  appropriate  hypotheses,  the 
solution  of  each  quadratic  subproblem  is  shown  to  generate  a descent 
direction  of  an  exact  penalty  function  6^,  where  p is  a fixed 
positive  real  number.  That  is,  let  be  the  current  estimate  of 

a solution,  let  solve  the  quadratic  subproblem  constructed  from 
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and  some  positive  definite  matrix,  as  described  above,  and  let 


dj^  = Then  ep(xj^)  < 0, 

tional  derivative  of  the  function 
The  new  estimate  is  then 

0 (x.  + a. d. ) = min 

P ^ ^ ^ 0<a<p 


where  D^f(x)  denotes  the  direc- 
f at  the  point  x in  the  direction 
+ where 

0p(x^  + adj^)  and  0 < p < + ». 


d. 


We  then  solve  the  quadratic  subproblem  constructed  from  and  a 

new  positive  definite  matrix,  and  continue  in  this  fashion.  For  each 
sufficiently  large  p,  this  scheme  is  globally  convergent.  Moreover, 
by  using  Lagrange  multiplier  estimates  and  choosing  each  properly, 

in  some  neighborhood  of  a K.K. T.  pair  (z,u)  the  line  search  can  be 
omitted,  and  the  pure  recursive  substitution  scheme  itself  generates 
a sequence  {(xj^,v^))  that  converges  to  (z,u)  [5,6,22]. 

In  this  paper  we  will  generalize  the  results  of  [7]  to  prove 
global  convergence  for  recursive  substitution  schemes  utilizing 
arbitrary,  possibly  non-quadratic,  approximating  subproblems.  Alternatively, 
our  results  can  be  viewed  as  the  global  version  of  the  results  of  [25], 
without  the  restriction  to  one-point  schemes.  We  will  restrict  our 
attention  to  solving  a convex  primal  with  convex  subproblems  so  that 
we  can  employ  the  full  power  of  convex  analysis  and  thereby  determine 
the  minimum  hypotheses  needed  to  guarantee  global  convergence.  In 
particular,  the  functions  defining  the  primal  and  each  subproblem  need 
not  be  differentiable.  Our  results  also  prove  global  convergence  of 
a new  algorithm  for  geometric  programming  [28]. 


This  paper  is  divided  into  six  sections.  In  the  next  two 
sections  we  examine  the  connection  between  constrained  optimization 
problems  and  exact  penalty  functions.  In  Section  4 we  consider  the 
directional  derivative  of  the  maximum  of  a finite  collection  of  convex 
functions.  We  present  the  global  convergence  theorem  in  Section  5. 
Section  6 is  devoted  to  concluding  remarks. 


2.  Exact  Penalty  Functions;  Part  1 

Our  goal  is  to  solve  convex  program  C : 

minimize  fQ(x) 

subject  to  fj^(x)  <0,  k = 1^2,  ...,p^ 

where,  for  each  k = 0,1,  ...,p,  -*  R is  a convex  function.  The 

solution  set  of  C is  the  set  of  all  points  that  solve  C. 

We  associate  with  program  C the  exact  penalty  function  0^, 
defined  by 

P 

®o^*^  ' ^0^*^  + P max(0,f  (x))  , 
k=l  ^ 

where  p is  a positive  real  number.  The  minimum  set  of  6 is 

P 

the  set  of  points  that  minimize  9 . We  call  9 an  exact 

P P 

penalty  function  because,  if  the  functions  (fj^)  are  differen- 
tiable, then  for  each  s uff iciently  large  p the  minimum  set 
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of  0^  and  the  solution  set  of  C coincide  [ 8, lU,21, 51] . Notice 
that  finite  values  of  p suffice,  in  contrast  to  the  classical 
exterior  penalty  function  methods  requiring  that  o approach  infinity 
[U],  Exact  penalty  functions  have  been  extensively  studied  since 
1967;  a good  bibliography  can  be  found  in  [8]. 

In  this  section  and  the  next,  we  consider  the  relationship 
between  a compact  family  of  convex  programs  and  their  associated  exact 
penalty  functions.  Inspired  by  [51],  we  impose  no  differentiability 
assumptions.  The  proof  of  Theorem  1 is  also  fashioned  after  [51]. 

We  denote  an  infinite  sequence  in  r”  by  [x^].  Where  no 
confusion  can  arise,  we  also  write  x = (xj^,X2, . . . ,x^) . By  x > 0 
we  mean  x.  > 0,  j = l,2,...,m.  If  x,  y £ R®,  by  (x,y)  we  mean 

fcy  llx||<„  we  mean  max^^^^^j^  [xj^  and  by  H-H  we  mean 
the  Euclidean  norm  (!•  If  X,  Y Cr!'^  and  a £ R,  by  X + Y we 
mean  [x  + y|x  £ X and  y £ Y),  and  by  aX  we  mean  (ax|x  £ X). 
Throughout  this  paper,  all  functions  map  all  of  R®  into  R,  for 
some  m,  and,  unless  otherwise  noted,  are  finite  valued.  We  say  that 
the  function  frR""  ->  R U {+  00}  is  convex  if  its  epigraph 
((x,a)|f(x)  < a]  is  convex  as  a subset  of  BT  . We  observe  that 
a finite  valued  convex  function  is  necessarily  continuous  (Corollary 
10.1.1,  [26]).  We  say  that  convex  program  C is  superconsistent  if 
for  some  ^ we  have  < 0,  k = 1,2,  ...,p.  The  end  of  a proof 

will  be  denoted  by  0. 
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We  begin  with  a review  of  point-to-set  maps  [9,10,11,15,16, 

17 , l8, 32 ] . Let  S cz  and  X c R***.  A point-to-set  map  M sends 
the  point  s in  S to  the  subset  M(s)  of  X.  If  s € S,  the  map 
M is  said  to  be  closed  ^ s if  [s^^)  ^ S,  s,  £ M(sj^),  and 

X imply  x £ M(s).  The  map  M is  said  to  be  uniformly 
bounded  near  s if  there  is  an  open  neighborhood  N of  s such  that 
the  set  M(y)  is  bounded,  and  M is  said  to  be  nonempty  near 

s if  there  is  an  open  neighborhood  N of  s such  that  M(y)  is 
nonempty  whenever  y£Nns.  If  TCS,  then  M is  said  to  be  closed 
on  T,  uniformly  bounded  near  T,  or  nonempty  near  T if  for  each  s 
in  T the  map  M is  closed  at  s,  uniformly  bounded  near  s,  or 
nonempty  near  s,  respectively. 

We  shall  treat  the  subset  S of  r”  as  a perturbation  space. 
For  each  fixed  s in  S,  we  consider  program  5(s): 

minimize  ^^(x, s) 

X 

subject  to  fj^(x, s)  <0,  k = 1,2,  ...,p, 

where 

fj^:R”  X r" ->  R , k = 0,l,  ...,p. 

We  define  the  following  maps: 

♦ (s)  = {x|?j^(x,s)  < 0,  k = 1,2, ...,p)  , 

((  is  a feasible  region  map; 
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lemma  2.  Let  M:r''  be  a point-to-set  map  and  let  f:R®  -♦  R 

be  a function.  Let  v(s)  = sup{f(x)|x  e M(s)}.  If  M is  closed  at 
s and  uniformly  bounded  near  s,  and  if  f is  continuous  on  R®, 
then  V is  upper  semicontinuous  at  s. 

Proof.  See  Theorem  5,  Hogan  [9],  (gi 

For  each  s in  S,  we  associate  with  program  C(s)  the  exact 
penalty  function  5p(*,s),  defined  on  r“  by 

0 (x,s)  = f_(x,s)  + p £ max(0,f  (x,s))  . 

^ k=l  ^ 


THEOREM  1.  Suppose  program  is  superconsistent.  Let  f^, 
be  functions  jointly  continuous  on  R®  X R*^  such  that  for  each  fixed 
s and  k = 0,1,  ...,p  the  function  fj^(',s)  is  convex  on  R®,  and 
such  that  for  each  x,  s,  and  k = 1,2, ...,p  we  have  f.  (x,s)  < f.  (x). 
Let  S be  a nonempty  and  compact  subset  of  R*'  such  that  the  solution 
set  of  C(s)  is  nonempty  and  bounded  ^enever  s £ S.  Then  there  is 
a positive  number  p^^  such  that,  whenever  P > pj^  and  s G S,  each 
minimum  of  the  function  0^(*,s)  is  also  a solution  of  program  C(s). 

Proof.  Since  C is  superconsistent,  then  for  some  jp  we  have 
fj^(x^,  s)  < fj^(x*^)  < 0 for  each  s in  S and  k = 1,2,  ...,p. 
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Hence,  C(s)  is  superconslstent  for  each  s in  S.  Let 


a = min  max  f,  (x®,s)  . 
xes  l<k<p 

Then  a < max^^^^p 

By  Lemma  1,  the  optimal  value  function  a>  is  continuous  on  S. 
Therefore,  0 = min^^g  cj(s)  for  some  finite  number  p.  Let 

r = max  f^(x®,s) 
ses 

and  let 


It  is  clear  that  > 0.  We  claim  that  is  the  desired  threshold 

value.  To  see  this,  choose  s in  S and  choose  p > p^.  Suppose 
that  the  point  v is  infeasible  for  C(s).  Since  f^C-js)  and 
^p(*,s)  a^ree  at  feasible  points  of  C(s),  to  establish  the  theorem 
it  suffices  to  find  a point  z which  is  feasible  for  Z?(s)  such  that 
0p(z, s)  < SpCv, s),  for  then  0p(*,s)  must  attain  its  minimum  in  the 
feasible  region  of  5(s). 

0 B 

By  definition  of  x and  v,  there  is  a point  x on  the 

0 B 

line  segment  joining  x and  v such  that  x is  on  the  boundary 

of  the  feasible  region  of  C(s).  Let  K = [k  € {1,2, . . . ,p)  ] fj^(x®,s ) = 0), 

and  let  the  auxiliary  function  cp  be  defined  on  r''*  by 

q3(x)  = fQ(x,s)  + p fj^(x,s).  It  follows  that  q)(x^)  = 6 (x®,s) 

= f^(x  ,s). 

Since  fj^(v,  s)  >0  whenever  k € K,  we  have 
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Since  cp  is  convex  and  x = tx  + (l-t)v  for  some  t in 
(0,1),  we  have 

cp(x®)  < tcp(x®)  + (l-t)  cp(v)  < tqpCx®)  + (1-t)  cp(v)  , 

or  equivalently,  (p(x  ) < cp(v).  Since  we  have  shown  that  cp(v)  < 9 (v,s) 
and  since  q)(x  ) = 0 (x  ,s),  it  follows  that  5 (x  ,s)  < 0 (v,s), 

P P P 

^ich  proves  the  theorem.  ® 

COROLLARY  1.1.  Suppose  that  program  C is  superconsistent  and  has  a 
nonempty  solution  set.  Then  there  is  a positive  number  such  that, 

whenever  p > p,,  each  minimum  of  the  function  0 is  also  a solution 

— X p 

of  program  C. 

Proof.  The  result  follows  from  Theorem  1 by  deleting  all  references 
to  the  variable  s and  the  set  S and  replacing  each  fj^(x, s)  with 
for  k = 0,1,  ...,p.  Notice  that  the  result  holds  even  if  the 
solution  set  of  C is  unbounded.  (This  corollary  appears  in  [31].)  ® 


3.  Exact  Penalty  Functions;  Part  2. 

To  prove  the  converse  of  Theorem  1,  we  will  require  several 

results  from  convex  analysis  [26],  Let  f be  a convex  function.  The 
* 

vector  X is  said  to  be  a subgradient  of  f at  the  point  x if 
f(y)  > f(x)  + (x  ,y-x)  for  every  y.  The  set  of  all  subgradients  of 


f at  X is  called  the  subdifferential  of  f at  x,  and  is  denoted 


by  Sf(x).  For  each  x,  ;^f(x)  is  a nonempty  and  compact  convex  set 
(Theorem  23. k,  [26]).  Moreover,  f is  differentiable  at  x if  and 
only  if  Sf(x)  = {Vf(x)}.  Clearly,  dfCox)  = a Sf(x)  for  each  x 
Sind  each  positive  number  a. 

T.RMMA  3,  Let  f fg,  ...  , f be  convex  functions  and  let 

f = f , + + . . . + f . Then  for  each  x we  have 

12  n 

. Sf (x)  = Sf^(x)  + SfgCx)  + •••  + ^fjj(x)  . 

F-foof.  See  Theorem  25.8,  Rockafellar  [26].  g) 

We  say  that  the  function  fiR®  -»  R U {+ » } is  proper  if  f 

is  convex  auid  if  f(x)  < +<»  for  at  least  one  x.  If  f:R”  -»  R U {+oo} 

is  a convex  function,  we  define  the  closure  of  f to  be  that  function 
whose  epigraph  is  the  closure  in  R of  the  epigraph  of  f.  It 
follows  that  a proper  convex  function  is  closed  if  and  only  if  it  is 
lower  semicontinuous. 

Let  f:R™  -^R  U [ + «]  be  a convex  function.  The  con.1ugate 

function  f is  defined  on  R®  by  f (x  ) = sup(  (x  ,x)-f(x)|xeR  ). 

The  conjugate  f is  a closed  convex  function,  proper  if  auad  only  if 

f is  proper.  If  f is  a closed  proper  convex  function,  then  the 

conjugate  of  f is  f,  that  is,  (f  ) = f.  Therefore,  the  conjugacy 

operation  f ->  f induces  a one-to-one  symmetric  correspondence  in 

tn  * 

the  class  of  all  closed  proper  convex  functions  on  R . Since  f 
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may  not  be  finite  valued,  the  subdifferential  df  (x  ) may  be  en5)ty 
* 

for  some  x . However,  if  f is  a closed  proper  convex  function, 
then  X G ^f  (x  ) if  and  only  if  x £ Sf(x). 

If  X is  a convex  set  in  r'",  the  indicator  function  of  X, 
denoted  by  6(*|X),  is  defined  on  R°^  by 


6(x|X) 


0 if  X e X 

+ “ otherwise. 


There  is  an  obvious  one-to-one  correspondence  between  a convex  set 

and  its  indicator  function,  namely,  &(x|X^)  = bCxIX^)  for  every  x 

if  and  only  if  X^  = X^.  The  conjugate  transform  of  &(• |X)  is  called 

the  support  function  of  X.  We  have  6 (x  |X)  = sup{(x  ,x)  - &(x|X)|x€R“} 

= sup{ (x  ,x)|x  e X).  If  X is  also  closed,  then  6(- |X)  and  6 (‘IX) 

are  conjugate  to  each  other  (Theorem  15.2,  [26]).  Therefore,  if  X^^ 

suid  Xg  are  closed  convex  sets,  we  have  8 (x  |Xj^)  =8  (x  jX^)  for 
* 

every  x if  and  only  if  X^  “ ^2" 

Let  f be  a (finite  valued)  convex  function.  It  can  be  shown 

(Theorem  23.^,  [26])  that,  for  each  x smd  d and  each  sequence 

(a.)  C R with  0 < a,  .T  < a.  and  lim,  a,  = 0, 

1 i+l  — 1 i — >00  1 

f(x  + a d)  - f(x) 

lim  

i “i 

exists  and  is  finite.  We  call  this  limit  the  directional  derivative 
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of  f in  the  direction  d at  the  point  x,  and  denote  it  by 
D^f(x).  Moreover,  for  each  fixed  x eind  d, 

gfa, . - °g)  - 

is  nondecreasing  on  (a  € R|a  > 0);  also,  for  each  a > 0 we  have 
D^^f(x)  = Q!D^f(x).  We  say  that  the  direction  d is  a descent  direction 
of  f at  the  point  x if  D^f(x)  < 0,  in  which  case  the  continuity 
of  f implies  f(x  + ad)  < f(x)  for  all  sufficiently  small  positive  a. 

LEMMA  U.  Let  f be  a convex  function.  Then  for  each  x and  d 
we  have  = max{ (x  ,d)|x  £ 3f(x)). 

Proof.  See  Theorem  23 A,  Rockafellar  [26],  (g» 

Although  a stronger  version  of  our  next  theorem  appears  in 
[23],  our  result  has  a particularly  sin^jle  proof  and  is  adequate 
for  our  purposes. 

THEOREM  2.  Let  f be  a convex  function  and  let  g = max(0,f). 


Then 

Sg(x)  is  nonempty  for  every  x and 

i) 

Sg(x)  = {0} 

if 

f(x)  < 0 

ii) 

3g(x)  D {ok  |0  < a < 1 and  x £ Sf(x)) 

if 

o 

N 

iii) 

3g(x)  = 3f(x) 

if 

f (x)  > 0 
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Proof.  It  follows  from  the  above  remarks  that  Sg(x)  is  nonen5)ty, 
closed,  convex,  and  bounded  for  all  x, 

i)  Suppose  f(x)  < 0.  Then  for  each  z in  some  neighborhood  of 
X we  have  g(z)  = 0.  Therefore,  for  each  d we  have 

0 = D^g(x)  = max{ (x*,d)lx*  e 3g(x))  . 

It  follows  that  ag(x)  = {0),  which  proves  i). 

ii)  Suppose  f(x)  = 0.  Choose  x in  3f(x).  Then 

f(y)  > f(x)  + (x^y-x)  = <x^  y-x) 
for  each  y.  If  (x*,y-x)  > 0,  then  f(y)  > 0,  so  that 

8(y)  = f(y)  > f(x)  + (x  ,y-x)  >g(x)  + (coc*,y-x) 

for  each  a in  [0,1];  hence  ox  £ Sg(x).  On  the  other 
hand,  if  (x  , y-x)  < 0,  then 

g(y)  = max(0,f(y))  > max(0, (x*,y-x))  = 0 

= max(0, (ox  ,y-x)  > g(x)  + (ox  ,y-x)  for  each  a > 0. 

* 

Hence  ca  £ bg{x)  whenever  a £[0,1],  which  proves  ii). 

iii)  Suppose  f(x)  > 0.  Then  for  each  z in  some  neighborhood  of  x 
we  have  g(z)  = f(z).  Therefore,  for  each  d we  have 

= D^6(x).  It  follows  from  Lemma  4 and  the  above  remarks 
that  Sg(x)  = 5f(x),  vrfiich  proves  iii).  0 
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A remarkable  feature  of  convex  programming  is  the  existence 
of  necessary  and  sufficient  conditions  for  optimality,  even  in  the 


I I 


I 

I 

b 

k 

\ 

I absence  of  differentiability.  Consider  again  convex  program  C: 

I 


1 


minimize  fQ(x) 

subject  to  fjj(x)  <0,  k = 1,2,...,  p. 

If  C is  actually  an  unconstrained  minimization  problem,  we  call  the 
solution  set  the  minimum  set. 

LEMMA  Let  f be  a convex  function.  Then  the  minimum  set  of  f 

*/  X 

is  f (0)j  in  particular,  the  infimum  of  f is  attained  if  Md  only 
if  ^f  (O)  is  nonempty. 

Proof.  See  Theorem  27.1,  Rockafellax  [26].  ® 


We  say  that  a vector  u in  is  a vector  of  Lagrange  multipliers 

for  C if  u > 0 and  if  the  infimum  of  the  function  f,^  + u,  f.  + • • • + u f 
- 0 T.  1 p p 

is  finite  and  equal  to  the  optimal  objective  function  value  of  C.  We 
define  the  Lagrangian  function  L on  R®  x by 

+•••+  V > 0 

- » otherwise. 


L(x,v)  = 


The  pair  (z,u)  is  said  to  be  a saddlepoint  of  L (with  respect  to 
maximizing  in  v and  minimizing  in  x)  if  for  every  x and  v we 


LEMMA  6.  The  point  z solves  C and  u is  a vector  of  Lagrange 
multipliers  for  C if  and  only  if  (z,u)  is  a saddlepoint  of  the 
Lagrangian  L.  This  condition  holds  if  and  only  if  the  following 
Karush-Kuhn-Tucker  (K.K.T. ) conditions  hold: 
i)  ^ 0 ^k^^^  < 0,  k = 1,2, . . .,p 

= 0,  k = 1,2,  ...,p 

iii)  0 £ Sf^(z)  + u 9f  (x)  +•••+  u 3f  (z). 

U LI  P P 

(if  i),  ii),  iii)  hold,  we  call  (z,u)  a K.K.T.  pair.)  Moreover, 

if  C is  superconsistent,  then  z solves  C if  and  only  if  there  is  a 

vector  u such  that  (z,  u)  is  a saddlepoint  of  the  Lagrangian  L 

(or  equivalently,  if  (z,u)  is  a K.K.T.  pair),  and  the  set  of 

Lagrange  multipliers  is  identical  to  the  set  of  points  maximizing 

(over  all  v)  the  function  min  L(x, v). 

x£R 

Proof.  See  Theorems  28.2  and  28.3  and  Corollaries  28.3.1  and  28.4.1, 
Rockafellar  [26].  (8> 

We  now  prove  the  main  result  of  this  section,  the  converse 
of  Theorem  1. 

THEOREM  3«  Under  the  hypotheses  of  Theorem  1,  there  is  a nonnegative 
number  Pg  such  that,  whenever  p > Pg  and  s G S,  each  solution  of 
program  5(s)  is  also  a minimum  of  the  function  S^(*,s). 
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Proof.  By  Lemma  1,  for  each  s in  S the  set  of  Lagrange  multipliers 
U(s)  is  nonempty  and  closed  at  s,  ajid  uniformly  bounded  near  s. 
Therefore,  by  Lemma  2,  for  some  nonnegative  number  we  have 

= max^g{  llu||^|u  e U(s)). 

We  claim  that  is  the  desired  constant.  To  see  this, 

choose  s in  S and  P > p2’  z be  a solution  of  S(s)  and 

let  u belong  to  U(s).  Let 


K_  = {k€  {1,2,  ...,p]  |fj^(z,s)  <0}  , 

Kq  = {k€  {l,2,...,p}|f^(z,s)  =0)  , 

and  ... 

= {k€  {l,2,...,p)lfj^(z,s)  > 0)  . 

Then  is  empty  and  = 0 each  k in  K , by  Lemma  6.  Hence,  by 

Lemma  6 sigain,  we  have 

0 € ^fQ(z,s)  + ? Ufc  Sfjj^(z,s)  = SfQ(z,s)  + ^ \ ^fjj(z,s) 

k =1  ^^^0 

C dfQ(z,s)  + P E {o^x^lO  < (\<  1 and  € dfj^(z,s)) 
kSK^ 

where  the  final  assertion  follows  from  Theorem  2 and  Lemma  3.  There- 
fore,  0 £ 50p(z,s),  which  implies  that  z € 39^(0, s).  By  Lemma  5, 
z is  a minimum  of  5p(*,s),  which  proves  the  theorem.  ® 
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COROLLARY  3.1.  Under  the  hypotheses  of  Theorem  1,  there  is  a positive 
number  such  that,  whenever  P > Pj  and  s € S,  the  solution  set 

of  C(s)  and  the  minimum  set  of  ^^(•,s)  coincide. 

Proof.  In  light  of  Theorems  1 and  5,  it  suffices  to  choose 

Pj  = max(p^,P2).  <S> 

COROLLARY  3.2.  Suppose  that  program  C is  superconsistent  and  has  a 

nonempty  ajid  bounded  solution  set.  Then  there  is  a positive  number 

Pj  such  that,  whenever  p > p^,  the  solution  set  of  C and  the  minimum 

set  of  9 coincide. 

P 

Proof . Half  of  this  corollary  follows  directly  from  Corollary  1. 1.  The 
other  set  containment  follows  from  Theorem  3 by  deleting  all  references 
to  the  variable  s and  the  set  S,  and  replacing  each  fj^Cx,  s)  with 
fj^(x)  for  k = 0,1,  ...,p.  (8) 

4.  Directional  Derivatives 

We  next  consider  the  directional  derivative  of  the  maximum  of 
a finite  collection  of  convex  functions. 

LEMMA  7.  Let  f:R™  -»  R be  a convex  function.  Then  for  each  x and 
each  6 > 0 there  is  a 5 > 0 such  that  df (y)  C '^fCx)  + eB  whenever 
y G X + 5B,  where  B = (z  £ R^jllzH  < 1),  the  unit  ball  in  r"*. 
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Proof.  See  Corollary  2U.5.I,  Rockafellar  [26].  ® 

LEMMA  8.  Let  f:R'”  -»  R be  a convex  function.  Then  the  point-to-set 
map  Sf  is  closed  and  uniformly  bounded  on  R®. 

Proof.  Since  Sf(x)  is  bounded  for  each  x,  it  follows  immediately 
from  Lemma  7 that  Sf  is  uniformly  bounded  on  r”'^. 

To  show  that  5f  is  closed,  let  x^  x,  let  x^  S of(x^), 

^ ^ 'X' 

and  let  x^^  x . Then  for  every  y we  have  f (y)  > f (x^)  + (x^,y-Xj^) 
for  every  i.  Since  f is  continuous,  it  follows  that 
f(y)  > f(x)  + (x  ,y-x).  Therefore,  x € Sf(x).  0 

LEMMA  9.  Let  be  an  infinite  sequence  of  real  numbers  such  that 

3 < 3 for  every  i.  Suppose  some  subsequence  {3.}  converges 

J.  “ 1 tJ 

to  some  number  3.  Then  the  entire  sequence  converges  to  p. 

Proof.  Choose  e > 0.  Since  lim^  . „ Pj  = for  some  N,  we  have 

J ^ ^ J 

Pjj  < P + e.  Hence, 

lim  sup  p,  £ Pw  5 P ^ • 
i ^00  ^ “1 

On  the  other  hand,  we  must  have  P < Pj^  for  every  i,  which  implies 
that  P < lim  inf,  P . . Thus  for  any  e > 0 we  have 

P < lim  inf  P^  < lim  sup  p^  < p + € , 

i — » 00  i — » 00 

which  proves  the  result.  0 
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The  proof  of  the  following  principal  result  of  this  section  is 
fashioned  after  [2]. 

THEOREM  Let  K be  a finite  set,  and  let  the  functions 
{f(-,k)|k  £ K)  be  convex  on  R™.  Let  g be  defined  on  r”  by 
g(x)  = max{f(x,k)|k  e K},  and  let  l(x)  = {k  £ K|f(x,k)  =g(x)). 

Then  for  each  x and  d the  directional  derivative  D^g(x)  is  finite, 
and  D^g(x)  = max{D^f (x,k) |k  £ l(x)}. 


Proof.  Choose  x and  d.  Since  g is  convex,  the  directional 
derivative  D^g(x)  exists  and  is  finite.  Since  D^^g(x)  = aD^g(x) 
whenever  cc  > 0,  it  suffices  to  prove  the  result  for  the  case 
lldll  = 1.  Let  x^  = X + a.d,  where  0 < a < a.  and  lim,  a = 0. 

and  lldll  = 1.  Then  for  each  i we  have  (x^-x)/(  ||x^-x||)  = d (since 
lldll  = 1,  we  can  interpret  the  components  of  d as  direction  cosines). 
Choose  kj^  in  l(x^)  and  choose  k in  I(x). 

For  each  i we  have 


g(x^)  - g(x) 


f(x^,k^)  - f(x.,k)  f(x  ,k)  - f(x,k) 


f(x^,k)  - f(x,k) 

- T (since  k^£l(x^)) 


* 

> (x  , x^-x) 


for  every  x in  9f(x,k)  . 


Since  (xj^-x)/ct^  = d for  every  i,  we  have 

g(xj^)  - g(x)  ^ ^ 

> (x  ,d)  for  every  x in  f(x,k), 

Ui 

or  equivalently, 


g(x^)  - g(x)  ^ ^ 

— >sup{(x,d)|x  c af(x,k)}  = f(x,k) 


a. 

1 


Since  this  holds  for  each  i and  each  k in  l(x),  it  follows  that 


D g(x)  = lira 
i ->  00 


g(x. ) - g(x) 


a. 

1 


> raax{D  f(x,k) |k  € I(x)]  , 


To  prove  the  reverse  inequality,  we  first  observe  that,  since 

K is  finite,  for  some  subsequence  x.  (where  j -^  + »)  and  some  k^ 

in  K we  have  k € l(x.).  Since  f(',k)  and  g are  continuous 

J 

functions  for  each  k,  it  follows  that 


g(x)  = lira  g(x  ) = lira  f(x.,k*^)  = f(x,k°)  . 
j-»00  0-»«>  ^ 


Therefore,  k £ l(x).  For  each  j,  we  have, 


g(x.)  - g(x)  f(x  ,k^)  - f(x,k°)  (x.,x.-x) 

^ ^ < il—J 


a. 

J 


a. 

J 


a. 


= (x^,d) 


* Q 

for  every  x.  in  Sf(x.,k  ).  That  is, 
«j  J 


g(x,)  - g(x) 


a. 


< sup((x  ,d)|x  £ 9f(x  ,k  ))  = D f(x  ,k°) 

J J J ^ J 
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Since  3f(’,K  ) is  closed  at  x and  uniformly  bounded  near  x,  it 

follows  from  Lemma  2 that  (for  fixed  d)  the  directional  derivative 

D^f(x,k*^)  is  upper  semicontinuous  at  x.  Therefore,  by  Lemma  9, 
d 

we  have 

g(x.)  - g(x) 

V(x)=lim  

i -»<»  1 


g(xj  - g(x) 


= lim 

j 00 


< lim  sup  D f(x.,k^) 
• u J 

0 


< D^f(x,k  ) < max{D^f (x,k) |k  £ l(x))  , 
where  the  final  inequality  follows  as  k^  c l(x).  Thus  we  have  shown 


max[D  f (x,k) jk  £ l(x)}  < D g(x)  < max{D  f (x,k) [k  £ l(x))  , 


which  proves  the  theorem.  ® 


5.  The  Global  Convergence  Theorem 

In  Sections  2 and  5,  we  considered  the  family  (C(s)|s  £ S) 
of  constrained  minimization  problems,  where  the  perturbation  space  S 
is  a compact  subset  of  Suppose  now  that,  for  each  s,  C(s)  is 

easily  constructed  from  C,  C (s)  resembles  C,  and  d(s)  is  easier 
to  solve  than  C.  We  may  then  regard  e(s)  as  an  approximating  sub- 
problem, and  expect  that  its  solution  helps  us  to  solve  C.  Indeed, 
we  will  show  that,  imder  appropriate  hypotheses,  solving  C(s) 
generates  a descent  direction  of  0^,  the  exact  penalty  function  for 
the  primal  C. 


In  primal  approximation  methods,  the  perturbation  s always 
supplies  an  estimate  of  a primal  solution,  and  may  also  supply  other 


information,  such  as  an  approximation  of  the  Hessian  of  the  La^rangian. 
Accordingly,  we  will  write  S = Y x W,  where  Y C r”  and  W C R^ 
for  some  q > 0.  If  s € S,  we  will  write  s = (y,w),  where  y € Y 
and  w € W.  By  q = 0 we  mean  W = 0,  in  which  case  we  disregard  W 
and  w,  so  that  (y,w)  G Y x W will  mean  y G Y.  For  each  k=0, 1, ...  ,p 
we  will  write  fj^(x,y,w),  instead  of  fj^(x, s).  We  will  also  write 
C(s)  as  C(y,w): 

minimize  fQ(x,y,w) 

X 

subject  to  fj^(x,y,w)  <0  , k = 1,2, ...,p  . 

In  constructing  C(y,w),  we  will  always  set  y equal  to  the  current 
estimate  of  a primal  solution,  while  w may  be  any  arbitrary  element 
of  the  compact,  possibly  empty,  set  W. 

For  instance,  we  can  generate  quadratic  subproblems  as  follows 
[7].  Suppose  the  functions  {fj^)  defining  C are  continuously  differ- 
entiable. Let  a and  b be  positive  numbers,  and  let  be  the 
collection  of  symmetric  m X m matrices  satisfying 

al|x|l^  < (x,Gx)  < b||x||^  for  every  x . 


Let  Xj^  be  the  current  estimate  of  a primal  solution  and  let 
be  the  current  approximation  of  the  Hessian  of  the  Lagrangian. 
form  the  quadratic  subproblem  QP(xj^,G^)  by  defining 


Gi  € i/ 

We 


25 


and 


fo(x,Xi,Gi)  = ^qCxj^)  + {Vf^ix^),  x-Xj^)  + i <x-Xj^,Gj^(x-Xj) ) 


fj^(x,x^,Gj^)  = fj^(xj^)  + (Vfj^(x^),x-x^)  , k = 1,2,  ...,p  . 

Returning  to  the  general  case,  the  following  theorem  shows  the 
crucial  role  of  each  approximating  subproblem  d(y,w).  If 
fj^:R  X R X R*^  -»  R,  we  denote  by  S^f(',y,w)  the  subdifferential 
map  with  respect  to  the  first  argument,  so  that  Sj^f(x,y,w)  C r”. 


3.  Suppose  program  C is  superconsistent.  Let 
f ^ functions  jointly  continuous  on  R*'**  x R*^  x R^ 

such  that  for  each  fixed  y,  w,  and  k = 0,l,...,p  the  function 

is  convex  and  3^fj^(y,y,w)  = dfj^(y),  and  such  that  for  each 
X,  y,  w and  k = 1,2,  ...,p  we  have  fj^(x,y,w)  < fj^Cx)  and 

= i’jjCy)*  Let  Y be  a nonen?)ty  and  compact  subset  of  r” 
and  let  W be  a compact  subset  of  R*^  such  that  program  C(y,w) 

has  the  unique  solution  z(y,w)  whenever  (y,w)  c Y x W.  Let 

d(y>w)  = z(y,w)  - y.  Then  there  is  a positive  number  such  that 

°d(y,w)®p^y^  ^ ° whenever  p > p^,  (y,w)  2 Y x W,  and  d(y,w)  ^ 0. 

Proof,  We  claim  that  the  value  p^  = maxCpj^,^^)  specified  in  Corollary 
3.1,  with  S = Y X W,  is  a satisfactory  choice.  To  see  this,  choose 

(y,w)  in  Y X W and  p > p^.  Let  z = z(y,w),  the  unique  solutions 

of  ^(y>w).  By  Corollary  3.1,  z is  also  the  unique  minimum  of 
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®p(’>y>w),  the  exact  penalty  function  for  S(y,w).  If  z ^ y,  then 
®p(z^y>w)  < 5p(y,y,w).  It  follows  that  for  each  y*  in 
\0p(y,y,w)  we  have 

0 > 5p(z,y,w)  - Sp(y,y,w)  > <y*,z-y)  = (y*,d>. 


where  d = z-y.  Hence,  by  Lemma  4,  we  have 

D/p(y,y,w)  = max(<y*,d>|y*  £ ^j^Sp(y,y,w) ) < 0 . 

Since  \fi^(y,y,w)  = Sfj^G)  for  k=0,l,...,p  and  fjj(y,y,w)  = fj^(y) 
for  k = 1,2, ...,p,  it  follows  from  Lemmas  5 and  4 and  Theorem  4 that 
D^®p(y)  = ^g^p(y>y>v)  < O,  vdilch  proves  the  theorem. 

The  heart  of  our  convergence  theorem  is  the  following  slight 
generalization  of  Zangwill's  convergence  theorem  [32]. 

LEMMA  10.  Let  Y be  a nonempty  and  compact  subset  of  R®  and  let 
W be  a compact  subset  of  R*^.  Let  r;Y  xW-»YxW  be  a point-to-set 
map.  Suppose  an  algorithm  generates  the  sequence  {(x^,ir^))  according 
to  the  recursion  £ r(xj^,w^),  where  (xq^Wq)  is  given. 

Suppose  that 

1)  there  is  a continuous  function  0:Y  -»  R such  that 

i)  if  X minimizes  0,  then  the  algorithm  stops  at  x 
ii)  if  X does  not  minimize  9,  then  whenever  (y,u)  G r(x,w) 
we  have  0(y)  < e(x) 

2)  r is  closed  on  Y x W. 

Then  either  the  algorithm  stops  at  some  point  (z,w)  such 
that  z minimizes  9,  or  some  subsequence  converges  to  some  (z,w) 
such  that  z minimizes  0. 
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Proof.  The  proof  does  not  differ  significantly  from  that  in  [52], 


and  will  be  omitted.  S 

Though  Zangwill' s result  guarantees  only  subsequential  con- 
vergence, the  following  lemma  provides  a sufficient  condition  for 
the  entire  sequence  (x^)  to  converge. 

LEMMA  11.  Let  f be  a convex  function  with  the  unique  minimum  z. 

If  the  sequence  {x.}  satisfies  lim.  f(x.)  = f(z),  then 

1 1 ^ M 1 

lim.  X.  = z. 

1 -)•»  i 

Proof.  See  Corollary  27.2.2,  Rockafellar  [26]. 

LEMMA  12.  Let  M^;X  -» Y and  -» Z be  point-to-set  maps. 

Let  the  composition  map  M^M^tX  Z be  defined  by 

M^M^Cx)  = J{M2(y)|y  € M^(x))  . 

Suppose  that  M^  is  closed  at  x and  M2  is  closed  on  M^(x).  If 
Y is  compact,  then  is  closed  at  x. 

Proof.  See  Corollary  4.2.1,  Zangwill  [52].  ® 

LEMMA  15.  Let  f:R”^  -»  R be  a continuous  function  and  let  the  point- 
to-set  map  M:R"^  X r”  -»  R*"  be  defined  by 

M(x,d)  = (y|f(y)  = min  f(x  + ad))  , 

0 <a<P 

where  0 is  a fixed  positive  number.  Then  M is  closed  on 

r“  X r“. 
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Proof.  See  Lemma  5.1>  Zsuigwill  [32],  ® 

lemma  lU.  Let  f ...  , fp  be  convex  functions  such  that 
{xjf  (x)  < 0,  k = 1,2,... ,p}  is  nonempty  and  bounded.  Then  for 
each  real  number  a the  level  set  max(0,fj^(x) ) < a] 

is  compact  if  it  is  nonen^ty. 

Proof.  See  Lemma  Han  [?]. 

We  now  define  the  algorithm.  Let  the  positive  nxambers  p 
and  ^,  the  nonempty  and  compact  set  T C R™,  and  the  contact  set 
W C be  given.  Choose  any  (xq,Wq)  in  T x W.  Consider  the 
following  idealized  algorithm. 


Algorithm  6:  For  i = 0,1,2,... 

1)  solve  5(x^,w^)  to  obtain  a solution  z^j  let  - x^ 

2)  find  an  such  that  ” min{0p(x^ +ad^)  |0<a<p}; 


let  x^^j_  = X.  + a^d^ 


3)  stop  if  = x^j  otherwise,  return  to  l)  with  Xj^^^  replacing 


Xj^  and  any  W replacing  w^. 


We  may  now  prove  the  global  convergence  theorem. 
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•l: 


THEOREM  6.  Suppose  that  program  C is  superconsistent,  that  its 
objective  function  is  bounded  below,  that  its  fea.  Ible  region 

{x|fj^(x)  < 0,  k = l,2,...,p]  is  bounded,  and  that  it  has  the 
unique  solution  z.  Let  f^,  f^,  ...  , be  functions  jointly 
continuous  on  X r”  X R^  such  that  for  each  fixed  y,  w,  and 

k = 0,1,  ...,p  the  function  ?j^(*,y,w)  is  convex  and 

such  that  for  each  x,  y,  w,  and 

k = 1,2,  ...,p  we  have  fj^(x,y,w)  < fj^(x)  and  fj^(y,y,w)  = fj^(y). 

Suppose  that  program  C(y,w)  has  a unique  solution  whenever 

(y,w)  £ r"”  X W.  Then  there  is  a positive  number  such  that, 

whenever  p > Pq,  algorithm  Q either  stops  at  the  unique  solution  z 

or  lim.  X.  = z. 

1 -»«>  1 


Proof.  By  Corollary  5.2,  there  is  a positive  number  p^  such  that 
z is  also  the  unique  minimum  of  0^  whenever  p > p^.  Let  f^  be 
bounded  below  on  r''^  by  -cr,  and  let 

P 1 P 

Y = {x|  E max(0,f  (x))  < max  [—  (f^(x)  + ct)  + E max(0,  f.  (x) ) ] ) . 
k=l  ^ "x£T  ^5  ° k=l  ^ 


Clearly,  we  have  T C Y.  Also,  since  E^_j^  max(0,fj^(z) ) = 0,  we  have 
z € Y.  By  Lemma  l4,  Y is  compact. 

Let  S = Y X W if  W 0,  and  let  S = Y if  W = 0.  Applying 
Corollary  3.1  to  the  compact  set  S,  we  conclude  that  there  is  a 
positive  number  p^  such  that  the  minimum  set  of  Cp(*>s)  and  the 
solution  set  of  C(s)  coincide  whenever  p > p^  and  s c S.  By 
hypothesis,  this  common  set  is  the  singleton  (z(s)}.  Let 
Pq  = max(l,  p^,  p^).  We  will  show  that  p^  is  the  desired  constant. 


We  will 


Choose  (xq,Wq)  in  T X W,  choose  p^,  and  let 

Yq  = {x|6p(x)  < 0p(xQ)).  Notice  that  z £ Yq,  since  P > p^. 
show  inductively  that,  if  Algorithm  S generates  the  sequence  {(Xj^,Wj^)) 
(possibly  a finite  sequence),  then  (x^)  ^ Yq  H Y.  It  is  clearly 
true  for  x^.  Suppose  x^  € Yq  Hy  for  j = 1,2,  ...,i,  and  let 
= z(x^,w^). 

Suppose  first  that  ^ ^i+1  ~ *i» 

the  algorithm  stops.  On  the  other  hand,  suppose  that  z^  / x^^. 

By  Theorem  5,  d.  = z - x.  is  a descent  direction  for  9 at 

111  p 

X. . Therefore,  the  line  search  must  generate  an  x.  . such  that 
1 1+1 

6p(Xi^l)  < 9p(x^).  From  the  induction  hypothesis,  we  have 

SpCXi)  < 0p(xQ).  Hence,  9p(x^^3^)  < 0p(Xj^)  < ^pC^o^' 

^ ^0*  follows  that 


max(0,fj^(xj^^^)) 


k=l 


- p " ^O^^i+1^^  ~ P ^ inax(0,fj^(xQ)) 

llw“l 


-p  + a)  + E max(0,fj^(xQ)) 

^ k=l 


^ ^ ^ max(0,fj^(xQ)) 

5 k“l 


(since  fQ(xQ)  + CT  > 0 and  p > max(l, p^)) 

< max[^  (Y  (x)  + cr)  + ^ max(0, f (x)]  (since  x-  £ T)  . 
■ x£T  ^5  ^ k=l 
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Therefore,  ^ ^0  ^ 

Let  the  map  D:S  -* R™  x R®  be  defined  by  D(s)  = ((x,z(s)-x)} 
where  s = (x,w),  and  let  the  map  LiR”  X r”  ->S  be  defined  by 

L(x, d)  = [(x  + ad,w)|0  (x  + Od)  = min  6 (x  + Od)  and  w€W}. 

^ 0<a<P 

Lastly,  let  the  composition  map  PiS  -»  S be  defined  by  P = LD. 
Clearly,  if  s^  S S,  then  algorithm  Q generates  the  point 
only  if  s^^^  £ P(s^). 

We  will  verify  that  the  hypotheses  of  Zangwill's  Convergence 
Theorem  are  satisfied  for  the  point-to-set  map  P and  the  continuous 
function  9^.  Actually,  we  have  already  shown  that  P:S  -» S and 
that  S is  compact. 

Suppose  that,  for  some  i,  the  point  x,  minimizes  6 . 

i P 

Since  p > p^,  it  follows  by  Corollary  3.2  that  x^^  also  solves  C. 

By  Lemma  6,  there  is  a vector  u of  Lagrange  multipliers  such  that 

(x.,u)  is  a K.K.T.  pair  for  C.  Since  f.  (x.,x.,w. ) = f,  (x. ) 

1 k i’  i’  i'  k 1' 

for  k = l,2,...,p  and  • • • >P> 

it  follows  that  (x^,u)  is  also  a K.K.T.  pair  for  C(xj^,Wj^) 
for  any  w^  in  W.  Therefore  x^  = z^,  the  unique  solution  of 
Hence  dj^  = z^^  - x^  = 0,  and  the  algorithm  stops. 

On  the  other  hand,  suppose  that  x^  does  not  minimize  9^, 
Reasoning  as  above,  it  follows  that  x^^  does  not  solve  C,  and 
hence  x^  does  not  solve  for  any  w^  in  W.  Therefore, 

the  unique  solution  z^  of  C(x^,Wj^)  must  satisfy  z^  ^ Xj^. 

Since  we  assumed  that  an  exact  line  search  is  executed  over  a nonempty 
interval,  it  follows  from  Theorem  5 that  6p(x)  < 0p(xj^)  whenever 
(x,w)  £ P(x  ,w, ). 


Let  n:S  -*  R®  be  defined  by  n(s)  = {z(s)}.  Then  Q is 
closed  on  S and  uniformly  bounded  near  S,  by  Lemma  1.  Therefore, 


: by  Lemma  2,  for  some  finite  number  b we  have  sup{||z(s)|l  |sGS)  < b. 

' Let  B = {y  £ r”|  llyll  < L]  • Then,  for  each  s = (x,w)  in  S,  the 

] “ 

pair  (s, z(s)-x)  is  contained  in  the  compact  set  S x (B-Y). 

Therefore,  by  Lemmas  1,  12,  and  13,  the  map  r is  closed  on  S. 

Thus  the  hypotheses  of  Lemma  10  are  satisfied;  we  conclude  that 

either  the  algorithm  stops  at  z or  some  subsequence  [x^)  converges 

i to  z.  Since  the  entire  sequence  {9  (x, ))  is  monotone  decreasing, 

i P ^ 

1 

by  Lemma  9 we  have 

lim  ^ ^0^^^  * 

i — » 00  j “O  “ 


It  follows  from  Lemma  11  that  lim,  x.  = z,  which  proves  the 

i -»»  1 ' ^ 

theorem.  O 


6.  Concluding  Remarks 

It  is  clear  from  the  proof  of  Theorem  6 that  Pq  depends 
on  T and  W but  not  on  p.  Although  the  theorem  holds  for  each 
positive  number  p,  in  practice  p should  be  chosen  suitably  large 
in  the  hopes  of  insuring  that  the  line  search  terminates  because 
the  minimum  is  reached,  and  not  because  the  upper  bound  P is 
encountered.  Such  a choice  could  only  speed  the  overall  convergence. 

The  requirement  that  fQ  be  bounded  below  can  always  be  met 
by  replacing  fQ  with  exp(fQ),  which  is  bounded  below  by  zero. 

If  C has  a unique  solution,  then  C will  also  have  a unique  solution 
when  exp(fQ)  replaces  fQ. 
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If  C has  the  unique  solution  z,  then  in  theory  we  can 
always  insure  that  the  feasible  region  is  bounded  by  imposing  the 
single  additional  constraint  (x,x)  < c,  where  c > (z, z).  In 
practice,  a very  large  value  of  c should  be  used.  Alternatively, 
we  could  bound  the  feasible  region  with  linear  constraints. 

The  most  restrictive  hypothesis  is  the  requirement  that  each 
S(y,w)  possess  a unique  solution.  For  quadratic  subproblems,  this 
is  accomplished  by  using  a positive  definite  matrix.  Notice  that 
global  convergence  is  assured  even  if  one  fixed  positive  definite 
matrix  is  used  for  each  quadratic  subproblem.  However,  the  local 
properties  of  the  algorithm  will  then  suffer. 

To  study  the  local  behavior  of  a recursive  substitution  scheme, 
it  is  usual  to  make  strong  assumptions,  including  the  requirement  that 
each  fj^  and  fj^  be  twice  differentiable,  and  that  a good  estimate 
of  a Karush-Kuhn-Tucker  pair  (z, u)  be  available.  Under  such  con- 
ditions, analysis  of  a recursive  substitution  scheme  utilizing  quadratic 
subproblems  [5)6,22],  or  arbitrary  approximating  subproblems  in  a one- 
point  scheme  [25])  has  shown  that  near  (z,u)  the  line  search  can  be 
omitted,  and  the  resulting  pure  recursive  substitution  scheme  generates 
a sequence  ((x^,v^))  that  converges  to  (z, u).  Moreover,  a linear, 
superlinear,  or  quadratic  convergence  rate  is  possible,  depending  on 
second  order  conditions.  Notice  that  the  multiplier  estimates  v^^ 
play  a crucial  role  in  the  local  analysis,  yet  are  not  explicitly 
considered  in  our  global  convergence  theorem. 


j 


We  hope  that  our  global  convergence  result,  motivated  by  the 
need  to  validate  an  algorithm  for  geometric  programming  [28],  will 
inspire  additional  work  in  non-quadratic  subproblems.  In  addition,  the 
results  in  Sections  2 and  5 suggest  a way  to  solve  convex  programs  with 
nondifferentiable  constraints.  Namely,  minimize  the  exact  penalty 
function  associated  with  the  program,  using  any  available  algorithm 
for  minimizing  a nondifferentiable  convex  function  [1,12], 
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